Baltimore Ceasefire 365 is a city-wide call asking Baltimore residents to avoid having any murders through quarterly Ceasefires and Peace Challenges (February, May, August, and November).
In this post, we use open data and R to look at the distribution of shootings in space and time, and model the impact of the Ceasefires.
Baltimore releases detailed data on issues relevant to the city, including crime. Here’s the raw numbers of shootings, plotted for each day.
library(tidyverse)
library(scales)
bpd <- read_csv("/Users/peterphalen/Documents/ceasefire/BPD_Part_1_Victim_Based_Crime_Data.csv")
# subset to shootings or homicides with a firearm
bpd <- subset(bpd, Description == "SHOOTING" |
(Description == "HOMICIDE" & Weapon == "FIREARM"))
bpd$CrimeDate <- as.Date(bpd$CrimeDate, format = "%m/%d/%Y")
# there are many crimes per day. collapse to daily counts
daily <- bpd %>% group_by(CrimeDate) %>% summarise(shootings = n())
# fill missing dates, because some had no shootings
full.ts <- data.frame(CrimeDate = seq(daily$CrimeDate[1],
daily$CrimeDate[nrow(daily)], by="day"))
daily <- full_join(full.ts,daily)
daily <- daily %>% group_by(CrimeDate) %>% mutate_all(funs(ifelse(is.na(.),0,.)))
ggplot(daily) +
aes(x=CrimeDate, y=jitter(shootings, .2)) +
geom_point(alpha=.2) +
xlab("date") +
ylab("daily shooting deaths (vertically jittered)") +
scale_y_continuous(breaks=0:6) +
scale_x_date(labels = date_format("%b %Y")) +
ggtitle(" ",
subtitle="Baltimore (2012-present)")
You can tap neighborhoods to see exact numbers.
This map shows the raw count of murders in Baltimore since 2012 by neighborhood.
library(geojsonio)
library(leaflet)
bpd <- subset(bpd, !is.na(Neighborhood))
# count by neighobrhood
count <- bpd %>%
group_by(Neighborhood) %>%
summarise(total.count=n())
# get polygons to draw neighborhood maps
nbds <- geojsonio::geojson_read("/Users/peterphalen/Documents/ceasefire/Neighborhoods.geojson", what = "sp")
get_shooting_count <- function(neighborhood){
nbd <- as.character(neighborhood)
if(nbd %in% count$Neighborhood){
count <- count[count$Neighborhood == nbd,]$total.count
return(count)
}
if(!(nbd %in% count$Neighborhood)){
return(0)
}
}
nbds$count <- sapply(nbds$Name, get_shooting_count)
# draw legend
range.count <- range(nbds$count,na.rm=T)
labs <- c(0,50,100,150,200,250)
pal.crime <- colorNumeric(colorRamp(c('#ccccff', 'red')), labs)
leaflet(nbds) %>%
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addPolygons(stroke=T,
weight=1,
popup=paste0(nbds$Name,"<br/>Shootings: ",nbds$count),
color=~pal.crime(count),
fillOpacity=.5) %>%
addLegend("bottomright",title="# of shootings (2012-present)",colors=~pal.crime(labs),labels=~labs)
This version adjusts by the population of each neighborhood. Be careful about the super bright red areas, because some of them have very few residents and so even a single shooting will make them look really dangerous even though they’re probably not. (You can tap neighborhoods to see the number of residents.)
#--------- population-adjusted --------------#
nbds$per1k <- nbds$count / nbds$Population * 1000
nbds$per1k <- round(nbds$per1k)
nbds$per1k <- ifelse(nbds$Population == 0, NA, nbds$per1k)
labs <- c(0,25,50,75)
pal.crime <- colorNumeric(colorRamp(c('#ccccff', 'red')),
labs,
na.color = "#b2b2b2")
countlabel <- paste0(nbds$Name,"<br/>",nbds$count," shootings among ",nbds$Population," residents")
nbds$countlabel <- ifelse(nbds$Population == 0, paste0(nbds$Name,":<br/>","No residents"), countlabel)
leaflet(nbds) %>% #draw population-adjusted map,
#areas with 0 residents are greyed
#out but can still be clicked
addProviderTiles(providers$CartoDB.PositronNoLabels) %>%
addPolygons(stroke=T,
weight=1,
popup=nbds$countlabel,
color=~pal.crime(per1k),
fillOpacity=.6) %>%
addLegend("bottomright",title="Shootings per one</br>thousand residents</br>(2012-present)",colors=~pal.crime(labs),labels=~labs)
Ceasefires have been called four times per year since August 2017. These are “ceasefire weekends” but their impact often extends well beyond a few days. One lasted twelve.
# first day (Friday) of ceasefire weekends
ceasefire.initial <-
as.Date(
c("08/04/2017",
"11/03/2017",
"02/02/2018",
"05/11/2018",
"08/03/2018",
"11/02/2018",
"02/01/2019",
"05/10/2019",
"08/02/2019"),
format="%m/%d/%Y")
ceasefire.weekends <-
lapply(ceasefire.initial,
function(x){
seq(from=x,
by="day",
length.out=3)
}
)
ceasefire.weekends <- do.call("c",
ceasefire.weekends)
We want to include information about the date, the day of the week (Mon-Sun), the day of the year (1-365), and a binary variable indicating whether a date occurs during ceasefire.
library(lubridate)
# the julian calendar is a simple system for numeric dates
daily$jul <- julian(daily$CrimeDate)
daily$weekday <- factor(weekdays(daily$CrimeDate),
levels=c("Monday","Tuesday","Wednesday","Thursday",
"Friday","Saturday","Sunday"))
daily$seasonal <- yday(daily$CrimeDate)
daily$day.of.year <- date_format("%b-%d")(daily$CrimeDate)
daily$ceasefire <- factor(ifelse(daily$CrimeDate %in% ceasefire.weekends, 1, 0),
labels=c("Regular Day","Ceasefire Weekend"))
We’ll predict shootings using: * spline time trend + splines are estimated to smooth noisy data * cyclical spline for yearly seasonality + the cyclical constraint requires the seasonal effect to begin where it ended * varying intercepts for day of the week + in case weekends have different patterns of shootings than weekdays * varying intercept for each day of the year + in case “special days”, like mothers day, show different patterns of shootings * binary indicator for the ceasefire
We use a Poisson link function because the outcome is a count and the mean is about the same as sd.
library(rstanarm)
model <- stan_gamm4(shootings ~
s(jul) +
s(seasonal,
bs="cc") + #cyclical constraint
ceasefire,
random= ~ (1 | weekday) +
(1 | day.of.year), # every day....
data=daily,
cores=4,
iter=1000,
family=poisson)
Here is a plot of the model against the observations. The red line represents the average. The grey area is the 80% predictive interval, which means that about 80% of days will have shooting counts that fall within the grey area. 10% of the time, the number of shootings will extend above the grey area
Ceasefires are visible as eight dramatic downward red spikes beginning in 2017.
daily$Estimate <- apply(posterior_linpred(model, transform=TRUE),
2, median)
# 80% posterior predictive interval for main plot
preds <- posterior_predict(model, transform=TRUE)
preds <- apply(preds, 2, function(x){quantile(x, prob=c(.1, .9))})
daily$high <- preds["90%",]
daily$low <- preds["10%",]
daily %>%
ggplot(aes(x = CrimeDate, y = shootings)) +
geom_point(alpha=.2) +
geom_line(aes(y = Estimate), alpha=.5, color="red") +
geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
scale_y_continuous(breaks=c(0,4,8,12)) +
xlab("date") +
theme_bw()
We plot the marginal effects, showing the components that make up the above time series. The specific numbers of shootings are to some extent dependent upon the reference point, but these figures give you the right idea of the shape of the trends.
### Time trend plot
time.frame <- with(daily, # Ref: August
data.frame(
jul=jul,
weekday=0, # not used for this prediction
ceasefire="Regular Day",
day.of.year="Aug-01",
seasonal=yday(as.Date("2018-08-01"))))
post <- posterior_linpred(model,
newdata=time.frame,
transform=TRUE,
re.form = NA)
time.frame$Estimate <- apply(post,2, median)
# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
time.frame$low <- ci["2.5%",]
time.frame$high <- ci["97.5%",]
trend.axis.dates <- seq(from=as.Date("2012-01-01"),
by="year",
length.out=9)
time.plot <-
time.frame %>%
ggplot() +
aes(x=jul, y=Estimate) +
geom_line(aes(y = Estimate), alpha=.5) +
geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
xlab("Time trend") +
ylab("Shootings") +
ggtitle(" ") +
scale_x_continuous(
breaks=julian(trend.axis.dates),
labels=date_format("%m-%Y")(trend.axis.dates)) +
theme_bw()
time.plot
First, here’s the deseasonalized time trend. Shootings increased sometime in 2015 and have stayed high.
Now we’re going to code up all the seasonal graphs.
### day of year plot
doy.for.single.year <- date_format("%b-%d")(seq(as.Date("0000-01-01"), as.Date("0000-12-31"), by = 1))
exact.day.frame <- with(daily, # Ref: regular day in mid-2018
data.frame(
jul=julian(as.Date("2018-08-01"))[1],
weekday=0, # weekday not used for this prediction
ceasefire="Regular Day",
doy.index = 1:length(doy.for.single.year),
day.of.year=doy.for.single.year,
seasonal=100))
post <- posterior_linpred(model,
newdata=exact.day.frame,
transform=FALSE,
re.form = ~(1|day.of.year))
exact.day.frame$Estimate <- apply(post,2, median)
# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
exact.day.frame$low <- ci["2.5%",]
exact.day.frame$high <- ci["97.5%",]
doy.axis.dates <- seq(as.Date("0-01-01"),by="month",length.out=12)
largest.day <- which.max(exact.day.frame$Estimate)
largest.day.row <- exact.day.frame[largest.day,]
lowest.day <- which.min(exact.day.frame$Estimate)
lowest.day.row <- exact.day.frame[lowest.day,]
doy.plot <-
exact.day.frame %>%
ggplot() +
aes(x=doy.index, y=Estimate) +
geom_line(aes(y = Estimate), alpha=.5) +
geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
xlab("Day of year") +
ylab("Shootings\n(log scale)") +
ggtitle(" ") +
scale_x_continuous(
breaks=yday(c(doy.axis.dates, as.Date("0-12-31"))),
labels=date_format("%b-%d")(c(doy.axis.dates, as.Date("0-01-01")))
) +
#highest day
annotate("text", x = largest.day.row$doy.index,
y = largest.day.row$Estimate + .3,
label = paste("largest increase:",largest.day.row$day.of.year)) +
geom_segment(aes(
xend = largest.day.row$doy.index,
x = largest.day.row$doy.index + 5,
yend = largest.day.row$Estimate + .02,
y = largest.day.row$Estimate + .26),
arrow = arrow(length = unit(0.5, "cm"))) +
# lowest day
annotate("text", x = lowest.day.row$doy.index,
y = lowest.day.row$Estimate - .27,
label = paste("largest decrease:",lowest.day.row$day.of.year)) +
geom_segment(aes(
xend = lowest.day.row$doy.index,
x = lowest.day.row$doy.index + 5,
yend = lowest.day.row$Estimate - .02,
y = lowest.day.row$Estimate - .24),
arrow = arrow(length = unit(0.5, "cm"))) +
theme_bw()
### seasonal plot
seasonal.frame <- with(daily, # Ref: regular day in mid-2018
data.frame(
jul=julian(as.Date("2018-08-01"))[1],
weekday=0, # weekday not used for this prediction
ceasefire="Regular Day",
day.of.year=0,
seasonal=1:365))
post <- posterior_linpred(model,
newdata=seasonal.frame,
transform=TRUE,
re.form = NA)
seasonal.frame$Estimate <- apply(post,2, median)
# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
seasonal.frame$low <- ci["2.5%",]
seasonal.frame$high <- ci["97.5%",]
doy.axis.dates <- seq(as.Date("0-01-01"),by="month",length.out=12)
seasonal.plot <-
seasonal.frame %>%
ggplot() +
aes(x=seasonal, y=Estimate) +
geom_line(aes(y = Estimate), alpha=.5) +
geom_ribbon(aes(ymin=low, ymax=high), alpha=.2) +
xlab("Seasonal trend") +
ylab("Shootings") +
ggtitle(" ") +
scale_x_continuous(
breaks=yday(c(doy.axis.dates, as.Date("0-12-31"))),
labels=date_format("%b")(c(doy.axis.dates, as.Date("0-01-01")))
) +
theme_bw()
### Day of week plot
wday.frame <- with(daily, # Ref: regular day in August 2018
data.frame(
jul=julian(as.Date("2018-08-01"))[1],
weekday=unique(daily$weekday),
ceasefire="Regular Day",
day.of.year=yday(as.Date("2018-08-01")),
seasonal=yday(as.Date("2018-08-01"))))
post <- posterior_linpred(model,
newdata=wday.frame,
transform=TRUE)
wday.frame$Estimate <- apply(post,2, median)
# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.025, .975))})
wday.frame$low <- ci["2.5%",]
wday.frame$high <- ci["97.5%",]
wday.plot <-
wday.frame %>%
ggplot() +
aes(x=weekday, y=Estimate) +
geom_point(size=2) +
geom_errorbar(aes(ymin=low, ymax=high),
width=.2) +
xlab("Day of week") +
ylab("Shootings") +
ggtitle(" ") +
theme_bw()
Here’s the day of the year plot we just constructed. We don’t see a lot of evidence for day-of-year effects: the 95% credible intervals for all these days overlap. The highest and lowest days (after accounting for slower-moving seasonal trends) are marked automatically.
doy.plot
We do see a strong seasonal effect, with summers showing up as particularly bad. We see the largest dips in shootings in February and March, cold and dark months in Baltimore.
seasonal.plot
There is essentially no weekday effect.
wday.plot
Finally, we can use this model to measure the effect of Ceasefires on shootings per day, after accounting for all these trends and seasonalities. The effect of the Ceasefire (plotted here as an odds ratio with 95% credible intervals) is classically statistically significant and suggests an approximate 50% reduction in shootings during ceasefire weekends, after accounting for all other effects (e.g., day of the week, holidays, etc):
We can also use this model to see the impact of the ceasefire at specific points in time. For example, here is the model-estimated impact of the ceasefire on Friday August 2nd, 2019.
### Ceasefire plot
pred.day <- as.Date("2019-08-02")
ceasefire.frame <- with(daily,
data.frame(
jul=julian(pred.day)[1],
weekday="Friday",
ceasefire=factor(c("Regular Day",
"Ceasefire Weekend"),
levels=c("Regular Day",
"Ceasefire Weekend")),
day.of.year=yday(pred.day),
seasonal= yday(pred.day)))
post <- posterior_linpred(model,
newdata=ceasefire.frame,
transform=TRUE)
ceasefire.frame$Estimate <- apply(post,2, median)
# 95% CI
ci <- apply(post,2,function(x){quantile(x, prob=c(.25, .75))})
ceasefire.frame$low <- ci["25%",]
ceasefire.frame$high <- ci["75%",]
# 50% posterior predictive interval for main plot
preds <- posterior_predict(model,
newdata=ceasefire.frame,
transform=TRUE)
ceasefire.frame$high.ppd <- apply(preds,2,function(x){quantile(x, prob=c(.75), na.rm=T)})
ceasefire.frame$low.ppd <- apply(preds,2,function(x){quantile(x, prob=c(.25), na.rm=T)})
ceasefire.frame %>%
ggplot() +
aes(x=ceasefire, y=Estimate) +
geom_point(aes(y = low.ppd), col="blue", shape=95, size=5) +
geom_point(aes(y = high.ppd), col="blue", shape=95, size=5) +
geom_point(aes(y = Estimate),
size=2) +
geom_errorbar(aes(ymin=low, ymax=high),
width=.2) +
xlab("") +
ylab("Shootings") +
ggtitle("Predicted shooting count for Friday August 8, 2019",
subtitle="with 50% credible intervals (black) and posterior predictive intervals (blue)") +
theme_bw()
Without a ceasefire, we’d expect about four people to get shot each day on average. But because this will be a Ceasefire weekend, the model expects about half that many to be killed.
Notice the little blue horizontal lines drawn at 1 and 3 for the ceasefire weekend estimate. Those marks represent the 50% window for the number of shootings you can expect on any given day. The fact that the lower tick mark rests at 1 and 3 means that there will be 0 shootings on a ceasefire day about 25% of the time, and >3 shootings about 25% of the time.
The ceasefire impact is real and at the same time, our model won’t be surprised by the occurrence of several shootings on each day of ceasefire weekend.